Parsing TV Program to Find Movies

by: Panagiotis Petridis, 8 years ago


So I have really been enjoying the beautifulSoup tutorial series so far! So I decided to put what I learnt to practice and made a website that reads the TV Program of the top channels in my country ( Greece ) and check to see if a movie is playing that day. If so it parses the Title, Time, and the channel that it is on and stores it on a .txt file. Then since I couldn't bother to write code in JavaScript I used my favorite language ( C++ ) to write a program that writes the .html file ( I know this is pretty redundant but it was something I knew how to do so I did that ). Then I ran the C++ program and uploaded the index.html with the .css to github and made it a github page. Finally in order for the program to keep getting updates of every day. I wrote a simple shell script that compiles runs the python file to parse the data, compiles and runs the .cpp file to write the index.html file and finally pushes the changes on github where the page gets updated ( also for convenience I linked a .tk domain so that it would be easier to type).

the site is: http://tipaizei.tk/
which simply translates to "What's on".tk

Here is the code:

Note:
There was an issue when I first tried to post this due to the greek letters so I replaced them all with the work GreekLet.

parse.py

import bs4 as bs
import urllib.request

sauce = urllib.request.urlopen('http://www.zappit.gr/tv-program').read()

soup = bs.BeautifulSoup(sauce, 'lxml')

rowsLeft = soup.find('div',{'class':'column small-12 medium-6'})

#for row in rowsLeft:
#channel = row.find('tr').find('span',{'class':'program__channel-name'}).string
#print(channel)
#for channel in row.find('table',{'class':'program__table_hor'}):

with open("movies.txt", 'r+') as out:

channels = soup.find_all('span',{'class':'program__channel-name'})
channelList = []

for channel in channels:
#print(channel.string)
channelList.append(channel.string)

chnl = ''

for tr in soup.find_all('tr'):
if tr.find('span', {'class':'program__channel-name'})!=None:
#print(tr.find('span', {'class':'program__channel-name'}).string)
chnl = tr.find('span', {'class':'program__channel-name'}).string

if(chnl=='OTE Cinema 1 HD'):
break
for movie in tr.find_all('td', {'class':'movie'}):
name = movie.find('span',{'class':'program__show'}).find('a').string
time = movie.find('span',{'class':'program__hour'}).string
#print(chnl)
out.write(chnl+'n')
#print(time)
out.write(time+'n')
#print(name+'n')
out.write(name+'n')

#print("SCRIPT_END")
out.write("SCRIPT_END")
#for movie in soup.find_all('td', {'class':'movie'}):
# name = movie.find('span',{'class':'program__show'}).find('a').string
# time = movie.find('span',{'class':'program__hour'}).string
# if movie.find('span',{'class':'program__show'}).find('a').string == None:
# print("efwfqwfehejfgadsgfkasgfoyuwfegdo22g3iUY@#$YUU!$U!$IF$!$IU$")
# print("n"+time)
# print(name)




######### WITHOUT WRITING TO FILE ########

'''

for channel in channels:
#print(channel.string)
channelList.append(channel.string)

chnl = ''

for tr in soup.find_all('tr'):
if tr.find('span', {'class':'program__channel-name'})!=None:
#print(tr.find('span', {'class':'program__channel-name'}).string)
chnl = tr.find('span', {'class':'program__channel-name'}).string

if(chnl=='OTE Cinema 1 HD'):
break
for movie in tr.find_all('td', {'class':'movie'}):
name = movie.find('span',{'class':'program__show'}).find('a').string
time = movie.find('span',{'class':'program__hour'}).string
print("n"+chnl)
print(time)
print(name)

'''


You can uncomment the print statements so that you also get to see the info on the terminal ( It is at greek though ).

Here is the .cpp file that writes the index.html file:


#include <cstdio>
#include <fstream>
#include <iostream>
#include <string>

using namespace std;

void addCardSimple(string channel, string time, string movie)
{
cout << "<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">" + movie + "</h1></div><div class="mdl-card__supporting-text"><p> <strong>" + time + "</strong> <br /> " + channel + " </p></div></div>" << endl;
}

int main()
{
ifstream fin;
fin.open("movies.txt");
freopen("index.html", "wt", stdout);


// Starting tags
cout << "<html lang="en"><head><!-- Title --><title>Ti Paizei</title><link rel="icon" href="http://www.freeiconspng.com/uploads/movie-icon-27.png"><!-- Get MDL --><link rel="stylesheet" href="https://fonts.googleapis.com/icon?family=Material+Icons"><link rel="stylesheet" href="https://code.getmdl.io/1.2.1/material.indigo-pink.min.css"><script defer src="https://code.getmdl.io/1.2.1/material.min.js"></script><!-- CSS --><link rel="stylesheet" type="text/css" href="style.css"></head><body><div class="mdl-layout mdl-js-layout mdl-color--grey-100"><main class="mdl-layout__content"><div class="mdl-grid">" << endl;

string movie;
string channel;
string time;

//Cards

while(true){
getline (fin,channel);
if(channel=="SCRIPT_ENDCRIPT_END")break;
else{
getline (fin,time);
getline (fin,movie);
addCardSimple(channel, time, movie);
}
}

// Ending Tags
cout << "</div></main></div></body></html>" << endl;


/*
while(true){
string str;
fin >> str;
if(str=="SCRIPT_END")break;
else{
channel = str;
getline (fin,time);
getline (fin,movie);
cout << str << " - " << time << " - " << movie << endl;
}
}
*/

fin.close();

return 0;
}

/*
while(true){
getline (fin,channel);
if(channel=="SCRIPT_ENDCRIPT_END")break;
else{
getline (fin,time);
getline (fin,movie);
cout << channel << " - " << time << " - " << movie << endl;
}
}
*/


and just in case you are curious here is the .css as well as the .html:

.html (generated from the C++ program):

<html lang="en"><head><!-- Title --><title>Ti Paizei</title><link rel="icon" href="http://www.freeiconspng.com/uploads/movie-icon-27.png"><!-- Get MDL --><link rel="stylesheet" href="https://fonts.googleapis.com/icon?family=Material+Icons"><link rel="stylesheet" href="https://code.getmdl.io/1.2.1/material.indigo-pink.min.css"><script defer src="https://code.getmdl.io/1.2.1/material.min.js"></script><!-- CSS --><link rel="stylesheet" type="text/css" href="style.css"></head><body><div class="mdl-layout mdl-js-layout mdl-color--grey-100"><main class="mdl-layout__content"><div class="mdl-grid">
<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">GreekLet</h1></div><div class="mdl-card__supporting-text"><p> <strong>23:15</strong> <br /> GreekLet </p></div></div>
<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">GreekLet</h1></div><div class="mdl-card__supporting-text"><p> <strong>01:45</strong> <br /> GreekLet </p></div></div>
<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">GreekLet</h1></div><div class="mdl-card__supporting-text"><p> <strong>23:00</strong> <br /> GreekLet </p></div></div>
<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">GreekLet!!!</h1></div><div class="mdl-card__supporting-text"><p> <strong>01:00</strong> <br /> Mega </p></div></div>
<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">GreekLet</h1></div><div class="mdl-card__supporting-text"><p> <strong>15:00</strong> <br /> GreekLet </p></div></div>
<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">GreekLet</h1></div><div class="mdl-card__supporting-text"><p> <strong>02:05</strong> <br /> GreekLet </p></div></div>
<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">GreekLet</h1></div><div class="mdl-card__supporting-text"><p> <strong>22:00</strong> <br /> GreekLet </p></div></div>
<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">GreekLet</h1></div><div class="mdl-card__supporting-text"><p> <strong>12:05</strong> <br /> GreekLet </p></div></div>
<div class="mdl-card mdl-cell mdl-cell--6-col mdl-cell--4-col-tablet mdl-shadow--2dp"><div class="mdl-card__title"><h1 class="mdl-card__title-text">GreekLet</h1></div><div class="mdl-card__supporting-text"><p> <strong>05:00</strong> <br /> GreekLet </p></div></div>
</div></main></div></body></html>


and the .css (this one is slightly easier to read since I wrote it):

.mdl-grid {
  max-width: 600px;
}
.mdl-card__media {
  margin: 0;
}
.mdl-card__media > img {
  max-width: 100%;
}
.mdl-card__actions {
  display: flex;
  box-sizing:border-box;
  align-items: center;
}
.mdl-card__actions > .mdl-button--icon {
  margin-right: 3px;
  margin-left: 3px;
}

@media screen and (max-width: 500px) {

  .mdl-card{
    max-width: 100%;
  }

}


Also here is the shell script that I wrote. It isn't anything fancy but I thought that I might as well include it.


#!/bin/bash  
echo "Parsing Data from the Internet"
python3 parse.py
echo "Updating index.html"
g++ update.cpp
./a.out
echo "Pushing update togithub"
git add index.html
git status
git commit -m "Daily Update"
git push
echo "Succesfully updated index.html"


I hope that you find my code useful and that you are inspired to work on your own personal projects. Special Thanks to Harrison for all the great videos on Beautiful soup.



You must be logged in to post. Please login or register an account.



Awesome, thanks for sharing your work with us!

-Harrison 8 years ago

You must be logged in to post. Please login or register an account.


Sure no problem!

-Panagiotis Petridis 8 years ago

You must be logged in to post. Please login or register an account.

Also here's the page on github:
https://github.com/PanagiotisPtr/tipaizei.github.io

-Panagiotis Petridis 8 years ago

You must be logged in to post. Please login or register an account.